Rank | Count | Beginning |
---|---|---|
97111 | 19479 | Il |
138386 | 16456 | La |
60945 | 10311 | E |
118735 | 7976 | In |
93288 | 7056 | I |
218121 | 6391 | Per |
201477 | 6175 | Non |
177577 | 5909 | Ma |
28 | 5437 | A |
160344 | 4716 | Le |
285381 | 4569 | Un |
260456 | 3705 | Si |
192886 | 3416 | Nel |
285427 | 3376 | Una |
170720 | 3070 | Lo |
267702 | 3035 | Sono |
252655 | 3002 | Se |
12772 | 2347 | Anche |
86529 | 2239 | Gli |
46671 | 2146 | Da |
36120 | 2126 | Come |
56763 | 2025 | Dopo |
39352 | 2004 | Con |
5816 | 1846 | Al |
240999 | 1709 | Questo |
253225 | 1697 | Secondo |
280322 | 1560 | Tra |
32836 | 1490 | Ci |
60953 | 1467 | E’ |
195121 | 1391 | Nella |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV